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A reliable cluster detection technique using photometric 
redshifts: introducing the 2TecX algorithm 
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ABSTRACT 

We present a new cluster detection algorithm designed for finding high-redshift clus- 
ters using optical/infrared imaging data. The algorithm has two main characteristics. 
First, it utilises each galaxy's full redshift probability function, instead of an estimate 
of the photometric redshift based on the peak of the probability function and an asso- 
ciated Gaussian error. Second, it identifies cluster candidates through cross-checking 
the results of two substantially different selection techniques (the name 2TecX repre- 
senting the cross-check of the two techniques) . These are adaptations of the Voronoi 
Tesselations and Friends-Of-Friends methods. Monte-Carlo simulations of mock cat- 
alogues show that cross-checking the cluster candidates found by the two techniques 
significantly reduces the detection of spurious sources. Furthermore, we examine the 
selection effects and relative strengths and weaknesses of either method. The simula- 
tions also allow us to fine-tune the algorithm's parameters, and define completeness 
and mass limit as a function of redshift. We demonstrate that the algorithm isolates 
high-redshift clusters at a high level of efficiency and low contamination. 
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1 INTRODUCTION 

Remote galaxy clusters have been used in a wide range of 
cosmological and astrophysical contexts. In cosmology, clus- 
ters can be used to trace the large-scale structure of the uni- 
verse. Their number density, as a function of redshift, can 
place constraints on various cosmological quantities. These 
include the mass density of the universe, the amplitude 
of the initial density fluctuations, and the cosmic growth 
function. Clusters also act as astrophysical laboratories for 
understanding the formation and evolution of galaxies and 
their environments. This is because the deep potential well 
of a cluster causes it to retain virtually all its gas and galax- 
ies, allowing a detailed inspection of the interaction between 
both. It is therefore desirable to have a large, homogeneous 
catalogue of clusters at a range of redshifts in the universe. 

Abell compiled the first large cluster catalogue, in which 
clusters were selected in a consistent manner (Abell 1958; 
Abell, Corwing & Olowin 1989). This catalogue was cre- 
ated from photographic observations, which suffer from non- 
linear plate-to-plate sensitivity variations and considerably 
large photometric errors (Sutherland 1988). Furthermore, 
the clusters were found by eye which poses problems for the 
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objectivity and completeness of the cluster sample and the 
line-of-sight projections contaminating it (e.g. Lucey 1983; 
van Haarlem, Frenk & White 1997). A particularly impor- 
tant advance has come from optical galaxy surveys using 
large arrays of CCD detectors, such as the relatively shallow 
(z < 0.4) Sloan Digital Sky Survey (SDSS) (e.g Goto et al. 
2002; Kim et al. 2002; Miller et al. 2005). A recent large-scale 
cluster catalogue using the SDSS was initiated by Koester 
et al. (2007a,b), detecting ~ 1400 clusters at 0.1 < z < 0.3. 
There have been numerous smaller-area surveys to much 
higher redshift, as for instance the Palomar Distant Clus- 
ter Survey (Postman et al. 1996); the ESO Imaging Survey 
(Lobo et al. 2000); and the Red Sequence Cluster Survey 
(Gladders & Yee 2005). 



Optical cluster surveys were limited for a long time to 
clusters at z ^ 1, due to the fact that the cluster galaxy 
population largely consists of early-type red galaxies. At 
redshifts of z <; 1, the 4000 A break moves into infrared 
bands, complicating the detection of these galaxies in optical 
surveys. A crucial development has been the advent of wide- 
field infrared cameras. Deep, large-area infrared studies have 
already become available from the Wide Field Infrared Cam- 
era (WFCAM) on the United Kingdom Infra- Red Telescope 
(UKIRT) and the Spitzer space telescope and will shortly be 
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available on the Visible and Infrared Survey Telescope for 
Astronomy (VISTA). 

There exist many methods for detecting clusters in 
optical imaging surveys. The problem is somewhat easier 
for galaxy datasets with spectroscopic redshifts owing to 
the accurate knowledge of each galaxies distance. However, 
spectroscopy is time consuming and approximate redshifts 
can be calculated via photometric redshift estimation. This 
technique is considerably less precise which makes looking 
for structure less straightforward. A successful photometric 
method for finding clusters is to use deep optical imaging 
data that span the rest frame 4000 A break (Gladders & 
Yee, 2000) . This is motivated by the observation that cluster 
early-type galaxies form a characteristic red sequence com- 
prising the brightest, reddest galaxies at a given redshift. 
The colour of this red sequence also provides an estimate of 
the redshift of the detected cluster, thereby reducing pro- 
jection effects (e.g. Gladders and Yee, 2005). However, at 
high redshift there is not yet substantial evidence whether 
all clusters do indeed show a red sequence. Merely select- 
ing by this characteristic could be introducing a large bias 
against younger clusters with ongoing star-formation. 

In this paper we present a new cluster detection method, 
specifically designed to detect high-redshift clusters using 
optical/infrared imaging data. In Section 2 we describe the 
cluster detection algorithm step by step. Section 3 contains 
details of the creation of mock catalogues, along with sim- 
ulations for parameter optimisation and to determine the 
completeness and contamination by spurious sources. Sec- 
tion 4 is a summary of the algorithm and its performance 
on the set of simulations. We assume throughout this paper 
that h = #o/100 kms^Mpc" 1 = 0.7, and a 17 M = 0.3, 
SI a =0.7 cosmology. All magnitudes are given in the Vega 
system. 



2 2TECX: A NEW CLUSTER DETECTION 
ALGORITHM 

Optical cluster surveys using selection methods based on 
photometric redshifts often suffer from two common prob- 
lems: (i) projection effects of fore- and back-ground galaxies 
and (ii) determining the reality of detected clusters. The for- 
mer issue arises because photometric redshifts, as opposed 
to spectroscopic redshifts, typically have errors of the order 
of a ~ 0.1; furthermore the photometric redshift probabil- 
ity functions (z-PDFs) are often significantly non-Gaussian 
and can for instance show double peaks. The second issue 
- the occurrence of spurious cluster detections - is due to 
sensitivity of the detection algorithm to noisy data. To cre- 
ate a cluster catalogue a compromise needs to be made be- 
tween completeness and contamination: we want to include 
as many clusters as possible above a certain mass limit, with- 
out suffering from contamination by spurious sources. It is 
important to understand the completeness and efficiency of 
cluster finders. 

To address these two problems, we create a new cluster- 
detection algorithm that is characterised by two main im- 
provements upon previous work: (i) the cluster-detection 
algorithm utilises the full z-PDF instead of a single best 
redshift-estimate with an associated Gaussian error; (ii) we 
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Figure 1. Schematic diagram of the cluster-selection algorithm. 
Each of the steps is described in detail in Sections 12 . 1 1 to \2 .41 



maximise the efficiency by cross-checking the output of two 
substantially different cluster detection methods. 

The algorithm is divided into six steps, described in 
more detail in the following subsections and shown schemat- 
ically in Fig. [T] 

(i) Determining z-PDFs for all galaxies in the field. 

(ii) Creating 500 Monte-Carlo (MC) realisations of the 
three-dimensional galaxy distribution, based on the galaxy 
z-PDFs. 

(iii) Dividing each MC-realisation into redshift slices of 
Az = 0.05 over the range 0.1 < z < 2.0. 

(iv) Detecting cluster candidates in each slice of all MC- 
realisations using independent Voronoi Tessellation (VT) 
and Friends-Of-Friends (FOF) methods. 

(v) Mapping the probability of cluster candidates for both 
methods based on the number of MC-realisations in which 
they occur. 

(vi) Cross-checking the output of the VT and FOF meth- 
ods to arrive at the final cluster-catalogue. 



2.1 Redshift probability distribution functions 

The photometric redshifts of Van Breukelen et al. (2006, 
henceforth VB06), who first applied our cluster-detection 
algorithm to optical/infrared imaging data, were created by 
an adapted version of Hyperz (Bolzonella et al. 2000) , using a 
set of Spectral Energy Distributions (SEDs) generated with 
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GALAXEV (Bruzual & Chariot 2003). Hyperz estimates 
photometric redshifts by fitting a range of SED templates 
to the measured fluxes in several photometric bands. The 
shape of the SEDs are determined by various parameters, 
such as the rate of ongoing star-formation, the age of the 
galaxy, the metallicity, and the reddening due to extinction. 
A redshift probability distribution function is constructed by 
calculating the probability of the best-fitting set of param- 
eters at each redshift. Thus the z-PDF does not reflect the 
probability with redshift for a single template, but rather for 
the total set of templates. The location of the maximum of 
the z-PDF is taken as the photometric redshift and an error 
can be estimated by fitting a Gaussian profile to the prob- 
ability peak. However, this does not take into account the 
often non-Gaussian and sometimes double-peaked nature of 
the z-PDF. These can arise because different features of the 
spectrum can be confused (for example the 4000 A break 
and the Lyman-a break at ~ 1000 A) or various templates 
can give solutions of comparable probability at different red- 
shifts. We therefore do not use a best-estimate photometric 
redshift, but take the entire z-PDF into account in our clus- 
ter search. The output of our adapted Hyperz program is the 
marginalised likelihood associated with each step in redshift 
space for each galaxy. However, the 2TecX algorithm can be 
applied to any photometric redshift dataset that contains a 
z-PDF for every galaxy. 
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Figure 2. An example of Voronoi Tessellations. The dots repre- 
sent the nuclei, randomly distributed over the field. Each Voronoi 
Cell encloses all points in the field that are closer to its nucleus 
than to any other nucleus. For example, all points within the filled 
(red) Voronoi Cell are closer to the nucleus marked by the star 
symbol than to any of the dots. 



2.2 The Monte-Carlo realisations and redshift 
slicing 

To include the entire z-PDF of each galaxy into our cluster- 
detection algorithm, we create 500 MC-realisations of the 
three-dimensional galaxy distribution by randomly sampling 
each z-PDF. We chose the number of realisations as a com- 
promise between computational time and sampling accu- 
racy of the z-PDF. We now have 500 cubes of RA, Dec, 
and z, where each galaxy is represented by a single point. 
The shape of the z-PDF of each galaxy determines its posi- 
tion in the cubes; if the peak in the probability distribution 
function is sharp the galaxy will occur in all cubes at ap- 
proximately the same redshift whereas if the z-PDF consists 
of two equally probable peaks the galaxy will be placed at 
either redshift in an equal number of cubes. 

Next, we divide each MC-realisation into redshift slices 
of a width, Az, approximately equal to the photometric red- 
shift error, a z . If the width is chosen to be significantly 
smaller, clusters can be undetected due to the distribution 
of their member galaxies over too many redshift slices; if 
it is chosen substantially larger, many spurious sources will 
be found owing to projection effects. In this paper we use 
Az — 0.05, as this is the approximate photometric redshift 
error of VB06. 

2.3 Two cluster selection methods 

We now have 500 MC-realisations of the three-dimensional 
galaxy distribution, each divided into redshift slices. In the 
next step, the algorithm applies two cluster selection meth- 
ods independently to each redshift slice of all the MC- 
realisations. The two methods used are Voronoi Tessellation 
and Friends-Of-Friends, which are described in more detail 
below. 
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Figure 3. Histogram of Voronoi Cell densities in a field of 2000 
randomly distributed background galaxies including a central 
overdensity of 100 galaxies with a Gaussian density profile with 
a = 1'. The dashed line is placed at / =</> and the dotted line 
denotes the position of the peak which is at /max = § </>• 



2.3.1 Voronoi Tessellations 

The VT technique divides a field of galaxies into Voronoi 
Cells, each containing one object: the nucleus. All points 
that are closer to this nucleus than any of the other nuclei are 
enclosed by the Voronoi Cell (see Fig. [2}. This technique was 
first applied to the modelling of large-scale structure (e.g. 
Icke & van de Weygaert 1987) but has more recently been 
used in cluster detection (Ebeling & Wiedenmann 1993; Kim 
et al. 2002; Lopes et al. 2004). One of the principal advan- 
tages of the VT method is that the technique is relatively 
unbiased as it does not look for a particular source geometry 
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Figure 4. Left: Voronoi Tessellations on a field of background sources with a central overdensity superimposed. The background consists 
of 2000 galaxies uniformly distributed throughout the field. The central structure comprises 100 galaxies and has a Gaussian density 
distribution with a <r = 1'. The blue cells denote the cells with density / > / m i n . The red cells compose the group that also satisfies the 
^gal > ^lim criterion. Note that all the high-density background fluctuations (blue cells) are not selected as cluster candidates. Right: 
The cumulative density distribution of the data in the field shown on the left. The red dashed line is the fit to the lower-density cells 
according to Eq. \3\ The dotted vertical line shows the value of / m i n = // </>, the minimum density above which high-density cells are 
selected (see Scction l3.4l for a discussion of the value of this parameter). 



(e.g. Ramella 2001). The parameter of interest is the area of 
the VT cells, the reciprocal of which translates to a density. 
Overdense regions in the plane are found by fitting a func- 
tion to the density distribution of all VT cells in the field; 
cluster candidates are the groups of cells of a significantly 
higher density than the mean background density. 

Kiang (1966) showed that, for randomly (Poissonian) 
distributed points, the differential distribution function of 
the cell area is of the following form: 

dp(a) = ^a 3 e- 4a da. (1) 

Here a = a / <a> is the dimensionless cell area in units 
of the average cell area: <a>= J^. =1 ai, where N is the 
total number of cells. T(x) is the Gamma Function. The 
cumulative distribution function for the cell area a is the 
integral of Eq. [1] namely: 

P(fi) = l-e- 4a (^ + 8a 2 +4a + l). (2) 
The density of the VT cells is the reciprocal of Eq. [2] 

V3/ 3 p f J 

Here / is the dimensionless cell density (the inverse of the 
cell area) in units of the mean cell density: 

/ = // </>=<«> /a. (4) 

In our algorithm, we approximate the density distribution of 
the background galaxies by a Poissonian distribution, allow- 
ing us to fit the cumulative density distribution of the data 
with a function of the form of Eq. [3] Note however that due 
to this approximation, the derived equations in this section 
do not reflect the exact statistics of the galaxy background. 
However by tuning the parameters through simulations (see 



Section 3.4), the resulting statistical approximation is ade- 
quate for our purposes. 

The aim of the fitting procedure is to calculate the aver- 
age density of the background cells, so we can subsequently 
impose a lower limit on the density of the cells that are 
caused by clustering. However, we can only fit the function 
to the lower-density end of the distribution which is not 
influenced by the cells in the overdense regions. Therefore 
we first estimate the background density by inspecting the 
histogram of the cell densities. 

Fig. [3] shows the VT cell density distribution in a field 
of 2000 randomly distributed background galaxies, contain- 
ing a structure of 100 galaxies in the centre with a Gaussian 
density profile with a — 1' (see also Fig. [4}. If we assume the 
peak in this histogram is not polluted by the overdense re- 
gions, the form of Eq.Q]dictates that the average background 
density is | times the density at which the peak occurs. This 
can be shown by requiring that the derivative of Eq.Q]is zero 
and applying Eq.[?] Next we can fit the predicted cumulative 
distribution function to the cumulative distribution function 
of our data where /estimated ^ 0.8, as suggested by Ebeling 
& Wiedenmann (1993). Once the exact background density 
is known, we isolate all cells with / > / mm ; /min is the 
density at which overdense regions start to contribute sig- 
nificantly to the cumulative density distribution. Adjoining 
high-density cells are grouped together; if the group con- 
sists of a number greater than a certain lower limit, it is 
taken to be a cluster candidate. Fig. [4] illustrates this proce- 
dure: the Voronoi tessellated field is shown on the left along 
with the high-density groups and the cluster candidate; on 
the right the cumulative density distribution is plotted. The 
limiting number of galaxies, nn m , can be calculated by set- 
ting a lower limit to N e ^ p : the expected number of groups 
caused by background fluctuations. Ebeling & Wiedenmann 
(1993) derived this quantity as described below in Eqs.[5]- 
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Figure 5. An example of Delaunay triangulation (Delaunay 
1934). The dots represent galaxies in the field which is the same 
as in Fig. [2] The filled (red) triangle and circle demonstrate the 
definition of the Delaunay Triangulation: the circumcircle of any 
triangle contains no other points than the vertices of the triangle 
itself. 



[8] Note that we use the lower case notation n for numbers 
of individual Voronoi Cells (each representing a galaxy), and 
the capital N for numbers of high-density groups of Voronoi 
Cells (corresponding to cluster candidates). 

The expected number of groups caused by background 
fluctuations, comprising a certain number of galaxies above 
the background level, n ga i, can be written as: 



AWt(/min,n ga i) = n bg iV fluct (/ m i n ,0)e b U™^ n &°-\ 



(•») 



where f m - m is the minimum density cut-off value used to se- 
lect high-density cells, and nt, g is the number of background 
galaxies expected in the field. The latter comes directly from 
the fitted average background density </> by recognising 
that <a>= 1 / </> and therefore rib g = A / <a>, where 
A is the total area of the survey field. iVa uc t(/ m in, 0) is the 
number of high-density groups with no extra galaxies above 
the background level; -/Vfl uct (/ m in, 0) and b have been shown 
by Ebeling & Wiedenmann (1993) to obey the following em- 
pirical relations: 



AWt(/min,0) = 0.047/ min 
6(/min) = 0.62/ min - 0.45. 



0.04, (6) 

(7) 

Integrating the function given in Eq. [5] from the limiting 
number of galaxies to infinity gives the expected number of 
groups caused by background fluctuations with n ga i > nn m : 

AT (t \ A^fluct (/min, 0) -6(/ min )nn„ , \ 

A r ox P (/min,n ga i > nxxm) = "-bg -= — - — -e Um,n; llm .(8) 

6(/mln) 

Thus, the limiting number of galaxy members in a group 
considered to be a cluster candidate is: 
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Figure 6. The Friends-Of-Friends detection method applied to 
a field of background sources with a central overdensity super- 
imposed, exactly as in Fig. l4l (left). The background consists of 
2000 galaxies uniformly distributed throughout a 0.5 X 0.5 deg 2 
field; the central structure comprises 100 galaxies and has a 
Gaussian density distribution with a cr = 1'. Only the central 
0.05 X 0.05 deg 2 is shown for clarity. The FOF algorithm was run 
with a linking distance of -Dunk = 175 kpc, with the simulated 
slice being at z = 0.5. The colours of the galaxies and links re- 
flect the iteration of the algorithm: the red galaxy was chosen 
first, the orange ones are its 'friends', the yellow ones are 'friends- 
of-friends', etc. 



The number of galaxy members, n ga i, is determined for each 
group and compared to nu m . Note that n ga i needs to be 
corrected for the background number density of galaxies, 
which is calculated by dividing the total area of the group, 



A t 



by the average cell area: n ga i,bg 



/ <a>. 



The Voronoi Tessellations method thus has two parameters 
for which a value needs to be chosen: the minimum cut- 
off dimensionless density / m i n and the maximum expected 
number of groups caused by background fluctuations, N exp . 



2.3.2 Friends-Of-Friends 

Friends-Of-Friends algorithms are commonly used in spec- 
troscopic galaxy surveys (e.g. Tucker et al. 2002; Ramella et 
al. 2002). A variant of this algorithm utilising photometric 
redshifts was proposed by Botzler et al. (2004) . They create 
redshift slices for their data cube and place the galaxies into 
the redshift slices according to their photometric redshift 
and error; objects with large errors are removed. The algo- 
rithm then calculates the distance of one galaxy to all others 
in the redshift slice, and groups the galaxies that are closer 
to each other than a given linking distance, -Dimk ('friends'). 
Next it calculates the distance from the new galaxies in the 
group (the 'friends') to all other galaxies in the slice and 
adds those that are within the linking distance ('friends- 
of- friends'). The group is complete when there are no more 
galaxies to be found within the linking distance to any of the 
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group members. If the group comprises a number of galax- 
ies above a specified minimum number, n m j n , it is a cluster 
candidate. Cluster candidates in separate redshift slices that 
contain one or more identical galaxy members are linked up 
as one and the same cluster candidate. 

Our Friends-Of-Friends algorithm is broadly similar to 
that of Botzler et al. (2004). However, we have made three 
key improvements, which will be discussed below. 

First, to speed up the computational efficiency, we ap- 
ply Delaunay Triangulation (Delaunay, 1934) to the field 
of galaxies in the redshift slice to identify each galaxy's 
nearest neighbours ('Delaunay neighbours'). This procedure 
uses the 'divide-and-conquer' method described in Lee & 
Schachter (1980), which has a very short computational run 
time. Hereby our computation time is greatly reduced as 
once we have completed the triangulation, there is no need 
to calculate the distance from each galaxy to every other 
galaxy in the field, but only to determine the distance to 
each galaxy's Delaunay neighbours. Fig. [5] demonstrates the 
principle of Delaunay Triangulation: each galaxy is con- 
nected to its nearest neighbours, forming triangles whose 
circumcircle contains no other galaxies than the ones that 
form the vertices of the triangle itself. 

When the triangulation is complete, a random galaxy 
is chosen and the proper distance, D, to its neighbours as 
linked by the Delaunay triangulation, is calculated from: 

£> = 2sin(^)£> A , (10) 

where Da is the angular distance of the redshift slice, and 9 
is the angle between the galaxies i and j in the tangent-plane 
approximation: 

= J (on cos (Si) — ctj cos (8j)) 2 + (5i — Sj) 2 . (11) 

In this equation a and 8 are the RA and Dec of the galaxies 
in units of degrees. Any neighbours for which D ^ -Dn n k 
are dubbed 'friends' and are added to the group. Next, the 
previous step is repeated for the new 'friends', taking only 
the galaxies into account that are not yet members of the 
group. When there are no more 'Delaunay neighbours' of 
any members of the group within linking distance, an as 
yet unanalysed galaxy is chosen and the whole process is 
repeated. This is illustrated by Fig. [5] where the Delaunay 
Triangulation is shown of a galaxy field with an overdensity 
superimposed and the iterations of the Friends-Of-Friends 
process are colour-coded. When all groups have been found 
in the redshift slice, only those with a number of galaxies 
greater than n m i n are retained. Evidently, the two parame- 
ters in FOF for which a value needs to be chosen are -Dn n k 
and n m i n . 

The second important difference between our algorithm 
and previous ones in the literature, such as Botzler et al. 
(2004), is the way we place the galaxies in the redshift slices. 
As we sample the full z-PDF to create MC-realisations of 
the three-dimensional galaxy distribution, we do not need to 
assign errors to individual galaxy redshifts. An object with 
a large redshift error will be distributed throughout many 
different slices in the 500 MC-realisations, and therefore not 
yield a significant contribution to the cluster candidates it is 
potentially found in. Thus there is no need to remove objects 
with large errors from the catalogue and no additional bias 
is introduced against faint objects with noisier photometry. 
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Figure 7. A probability map of clusters found by the Voronoi 
Tessellation method at redshift z ~ 1.0. Colours are normalised 
to the highest probability in the field. 

The third modification to existing algorithms is the way 
we link up cluster candidates throughout the redshift slices. 
Instead of comparing individual galaxies in the clusters and 
linking up the clusters with corresponding members (see 
Botzler et al. 2004), we use probability maps of all redshift 
slices to locate likely cluster regions. This is discussed in 
Section |2H 

2.4 Probability maps and cross-checking 

Once the two cluster selection methods have determined 
the cluster candidates in the redshift slices for all MC- 
realisations, we combine the MC-realisations to create prob- 
ability maps for both methods for each redshift slice. These 
maps are created by calculating the extent of all cluster de- 
tections in RA and Dec according to the positions of the 
cluster members. The regions of the field that are found to 
be in a cluster in many MC-realisations are high-probability 
cluster locations. Fig. [7] shows an example of a probability 
map: the VT cluster candidates in this slice at z — 1.0 are 
contoured and coloured, with black through to red indicat- 
ing low to high probability. 

Since the error on the photometric redshifts of the 
galaxies is usually larger than the width of the redshift slices, 
each cluster candidate is typically found in several adjoin- 
ing slices. We join the cluster candidates that occur in the 
same location in several slices by locating the peaks in the 
probability maps and inspecting the area within their con- 
tours in the adjoining redshift slices for cluster candidates. 
This procedure is carried out as follows: per redshift slice, 
starting at the highest detected contour level, we calculate 
the positions of the cluster contours and determine their 
'centres of mass', where each point within the contour is as- 
signed an equal 'mass'. Next, we inspect the contours one 
level down, and verify if any of these are unoccupied by 
any of the previously found centres. If so, this is labelled a 
new cluster (of a lower probability). We continue until we 
have inspected all contour levels down to 0.05 (or 5% of the 



The 2TecX algorithm 7 




o.o 



0.5 



1.0 

2 



1.5 



2.0 



Figure 8. The cumulative number of cluster candidates, which 
form the constituents of one particular cluster, versus redshift. 
The cluster candidates are linked up between the redshift bound- 
aries marked by the dashed (red) lines. The final cluster consists 
of ~ 540 cluster candidates in different MC-realisations, spread 
out over four redshift slices around z ~ 1. The maximum num- 
ber of constituent candidates would be 2000, if the cluster was 
detected in all four slices in all MC-realisations. The dotted line 
marks the weighted average redshift of the cluster. 



number of MC realisations) in all redshift slices. Finally, we 
join each cluster centre to the cluster centres in adjoining 
redshift slices that lie within 0.5 Mpc in projected distance. 
Fig. [8] shows the cumulative number of MC-realisations ver- 
sus redshift for one cluster candidate. The redshift limits 
of the linking procedure are placed at the slices where the 
cluster candidate is no longer found in a significant number 
of MC-realisations (i.e. < 2.5% of the MC realisation). The 
final cluster redshift is determined by taking the mean of the 
redshift slices, weighted by the number of MC-realisations 
in which the candidate is detected. 

We assign a reliability factor F to each cluster by count- 
ing the total number of MC-realisations in which it occurs 
in any of the linked-up redshift slices, and dividing this by 
the total of 500 realisations. This means that if a cluster 
candidate occurs in four slices in a single realisation, it is 
only counted once. Therefore the maximum number of re- 
alisations in which it is counted is 500, in which case we 
would have F — 1.0. To create the final cluster catalogue, 
we cross-check the output of the two detection methods and 
select only those clusters that have been found by both VT 
and FOF with a reliability factor F above a suitable limit. 
This parameter Fu m is dependent on the accuracy of the 
photometric redshifts, and the completeness and efficiency 
of both detection methods. The higher the chosen limit, the 
more efficient yet the less complete the final cluster catalogue 
will be. The best level of Fu m is determined by simulating 
mock catalogues, taking into account the characteristics of 
the data to be used. Below we describe the results of running 
the 2TecX algorithm on our simulated catalogues. Based on 
these, VB06 used a value of Fu m = 0.2 to obtain a reliable 
cluster catalogue at 0.5 < z < 1.5. 
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Figure 9. Number counts versus magnitude for VB06's data 
catalogue (black) compared to the number counts of the mock 
background catalogue (dashed, red). The 5-ct detection limit is 
Kum = 20.6. 



3 SIMULATIONS 

3.1 Mock catalogue characteristics 

To test the behaviour of the cluster-detection algorithm and 
to determine the optimal values of the parameters we run 
a set of simulations on mock catalogues. These catalogues 
need to mimic as closely as possible the data to which the 
algorithm will be applied. VB06 describe the application of 
our algorithm to a combined optical/infrared catalogue on 
the Subaru- XMM-Newton Deep Field (SXDF) consisting of 
BVRi'z' Subaru SuprimeCam data; JK United Kingdom 
InfraRed Telescope (UKIRT) Wide Field CAMera (WF- 
CAM) data from the UKIRT Infrared Deep Sky Survey 
(UKIDSS); and 3.6 and 4.5 pm bands data from the Spitzer 
InfraRed Array Camera (IRAC). Our mock catalogues are 
designed to have the same area and K-ba,nd limiting mag- 
nitude as the data catalogue of VB06. Furthermore, when a 
galaxy's z-PDF is needed, this is randomly drawn from the 
collection of z-PDFs used by VB06 that peak at the position 
of the simulated galaxy's redshift. Thus the z-PDFs of the 
simulated data accurately reflect the photometric redshift 
error and the functional form of the z-PDFs in the real data 
catalogue. 



3.2 Simulating the galaxy background 

We create catalogues with a galaxy background distribution 
randomly placed in the field with 0.1 ^ z ^ 2.0 (neglecting 
clustering of both the background and the clusters). The 
galaxy luminosities and number densities are determined 
by the K-band Schechter luminosity function of Cole et 
al. (2001) with $* = 3.7 x lO^Mpc" 3 , a = -0.95, and 
Mx = —24.18. To obtain the correct value for we added 
0.017 (Hewett et al. 2006) to Cole's original value to account 
for the difference between the A"-band filters of WFCAM 
and 2MASS (used by Cole et al. 2001). Also, we assume 
passive evolution of the luminosity function (e.g. Gardner 
et al. 1996). We calculate the e+k (evolution and redshift- 
ing) correction to at all redshifts by using GALAXEV 
to create a stellar population synthesis SED. The SED con- 
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sists of a star-burst at z = 4, exponentially decaying with 
t = 1 Gyr, and has solar metallicity. The creation of the 
background catalogues is done in the following steps: 

(i) We slice the three-dimensional field into redshift slices 
of Az = 0.05 over which we assume the luminosity function 
to be constant. 

(ii) For each slice, we calculate the volume (V), deter- 
mined by the angular size of the field and the redshift limits, 
and the e+fe corrected M K . 

(iii) The number of simulated galaxies in the slice is cal- 
culated according to the luminosity function: 



Agal = V X 

and 



gai = v x / <&{L)dL, 

o 



(12) 



(13) 



where C = L/L* is a dimensionless luminosity. 

(iv) Luminosities are assigned to all galaxies according to 
the luminosity function of Eq. 1131 and the absolute magni- 
tudes are determined with: 



M K = M K - 2. 5 log C. 



(14) 



(v) The galaxies are randomly placed in redshift, RA, and 
Dec within the slice according to a uniform distribution. 

(vi) The apparent magnitudes of the simulated galaxies 
are calculated: 



m K = M K + 5 log(D L ) - 5, 



(15) 



where Dl is the luminosity distance to the galaxy in parsec. 
We now impose a magnitude limit of Kum < 20.6 to match 
the 5-cr limit of the data catalogue of VB06. Only the galax- 
ies with nix < Ku m are retained in the mock catalogue. 

The number of galaxies as a function of magnitude in each 
mock catalogue is entirely consistent with the number counts 
in the data catalogue up to the 5-cr limit, as is shown in 
Fig. H 

3.3 Adding mock clusters to the catalogue 

We superimpose simulated clusters on the background cat- 
alogue. To create the mock clusters we take the following 
steps: 

(i) We choose a total cluster mass (including dark matter) 
and a mass-to-light ratio of M [Mq]/L \L@] = 75h (Rines 
et al. 2001) which is assumed constant in terms of L* (a 
quantity we assume to evolve passively with redshift). To 
deduce the total luminosity of the cluster in Jf-band we 
calculate: 



L* = 10A 



(16) 



and therefore the total dimensionless luminosity in units of 



£tot = M t , 



t [M ]/( 



75/il0 



(17) 



Here Kq — 3.28 is the Tf-band magnitude of the sun and 
M K is taken from the cluster luminosity function derived 
by Lin, Mohr & Stanford (2004), who found M* K = -24.34, 
$* = 3.0 Mpc -3 , and a — —1.1. Again we assume passive 



evolution of the cluster luminosity function with a formation 
redshift of Zt OTm = 4. 

(ii) We calculate the number of galaxies in the cluster by 
using Eq. [12] and recognising that: 



(18) 



Ltot =Vx L$(L)dL. 
Jo 

Together this gives: 



iV ga l = Ltot X 



f °°L$(L)dL' 



or in units of L*\ 



gal 



<Cto 



(19) 



(20) 



Luminosities are assigned to the galaxies according to the 
luminosity function of Lin et al. (2004) . 

(iii) The galaxies are spatially distributed within the clus- 
ter according to an NFW profile (Navarro, Frenk & White, 
1997) with a cut-off radius of 5 Mpc. Assuming galaxies to 
be perfect tracers of the dark matter, the galaxy number 
density n in the two-dimensional projected NFW profile is 
(Bartelmann 1996): 



n = < 




(21) 



Here x = r/r s , where r is the radius in projection. The scale 
radius r 3 is related to r2oo (the radius of the circle whose 
density is 200 times the critical density of the Universe) via 
c = r2oo/r s - The concentration factor c has been determined 
from numerical simulations by Dolag et al. (2004) to obey 
the empirical relation: 

< 1 + * = *(]|)" (22) 

with c = 9.59, M = 10 14 /i _1 M©, and a = -0.1. 
The radius r2oo is determined by the total mass of the clus- 
ter: 



?~200 



M 20 



|7r200pc 



where for a flat Universe: 

3 „2 / /- . \3 r 



Per 



8ttG 



HS (1 + z) Om+«; 



(23) 



(24) 



In this equation Hq is expressed in kms -1 km -1 , and G is 
the gravitational constant. When simulating elliptical clus- 
ters, we use the radius r2oo for the profile over one axis, 
and the radius e x r2oo over the other axis, where e is the 
ellipticity expressed in minor axis over major axis. 

(iv) The redshifts of the cluster galaxies are randomly 
offset from the cluster redshift according to a Gaussian dis- 
tribution with a = 0.05(1 + z), which is the expected photo- 
metric redshift error (see VB06). This error is much larger 
than the contribution of the velocities of the galaxies within 
the cluster, which allows us to neglect the latter. 

(v) Again, we apply the magnitude limit of Kvm < 20.6 
to the apparent magnitudes of the cluster galaxies to obtain 
the final catalogue. 
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Figure 10. Example of simulated equatorial fields containing three types of clusters (red) superimposed on a galaxy background (black) 
at 0.1 < z < 2.0. Left: Nine clusters at z = 0.2 with total luminosities of Ltot = 10, 20, 30, 40, 50, 100, 150, 200, 300 L* . Middle: Nine 
clusters of Ltot = 50 L* at z = 0.2, 0.4, 2.0. Right: Nine clusters of L tot = 50 L* at z = 0.2, with ellipticity e = 0.1, 0.2, 1.0 (PA 
= 0°). 



We create different types of mock cluster catalogues: (i) 
a set of clusters with varying mass or total luminosity at 
fixed redshift, (ii) a set of clusters of fixed mass at varying 
redshifts, and (iii) a set of clusters of fixed mass and redshift, 
but with varying ellipticity. The varying mass and redshift 
catalogues are created such that each combination of mass 
and redshift is represented and all catalogues are recreated 
randomly ten times. In Fig. [10] we show the distribution of 
galaxies in our three types of catalogue: on the left nine 
clusters at z = 0.2 with total luminosities of L to t = 10, 20, 
30, 40, 50, 100, 150, 200, 300 L*; in the middle nine clusters 
of Ltot = 50 L* at z = 0.2, 0.4, 2.0; on the right nine 
clusters of L to t = 50 L* at z = 0.2, with ellipticity e = 0.1, 
0.2, 1.0 at a position angle (PA) of 0°. 



3.4 Simulation results 

The aim of the simulations is to explore the behaviour of the 
FOF and VT detection methods, and to optimise the algo- 
rithm's parameters. The VT and FOF methods each have 
two free parameters. For FOF these are the linking distance 
in proper coordinates, Dii n k, and the minimum number of 
galaxies in a cluster, n m i n . Guided by Botzler et al. (2004) 
we experimented with values between 0.125 Mpc ^ Dn n k ^ 



0.175 Mpc, and 3 < 



^ 5. For VT the parameters are 



the expected number of groups due to background fluctua- 
tions, JVexp, and the lower limit on the cell density, / m i n . We 
followed the method of Ebeling & Wiedenmann (1993) and 
set JV GX p to 0.1. For / m i n we tried values of 1.2 — 2.2, where 
/ = 1.0 equates to the mean cell density of the field. We use 
the parameters that give the best completeness of detected 
clusters whilst keeping the contamination low: Dii n k = 0.175 
Mpc, n mln = 5, and / min = 1.74. 

Now that we have determined each algorithm's optimal 
parameters, we test the behaviour of the cluster detection 
routine by trying to recover the clusters of the three dif- 
ferent types of mock catalogues described in the previous 
section. Fig. [TT] shows an example: the left panel contains 
the simulated clusters, the middle panel the clusters recov- 
ered by VT, and the right panel the clusters recovered by 



FOF. Note that in the left panel the background galaxies 
have been removed for clarity; naturally they were present 
when running the cluster detection algorithm. 

Both methods recover all clusters satisfyingly; there is 
no obvious bias to cluster morphology as the elliptical clus- 
ters are recovered very well by both methods. However, the 
recovered shape of the clusters differs for both methods: VT 
tends to pick up more background galaxies at the edges of 
the clusters as the number of recovered cluster members, 
ATgai, in any cluster is sensitive to the local field density. 
By contrast, the galaxy members recovered by FOF are 
more centrally concentrated; the total number of recovered 
galaxies per cluster is consistent throughout the random re- 
alisations of the catalogues. This is illustrated in Fig. [12] 
which shows the fraction of recovered cluster galaxies by 
both methods for the types of catalogues shown in Fig. 1111 
The number of simulated cluster galaxies is determined both 
by the cluster's mass and the magnitude limit at its respec- 
tive redshift. The difference in both methods is particularly 
noticeable in the middle panel of Fig. 1121 where the recov- 
ered fraction of cluster galaxies is shown versus redshift. As 
there are few background galaxies in the high-redshift slices, 
the fraction of detected galaxies per cluster declines in the 
FOF method as there is a smaller chance of finding back- 
ground galaxies within the linking distance. However, the 
fraction of detected cluster galaxies remains constant in VT 
because the algorithm's parameters to estimate an overden- 
sity are scaled to the background density, which negates the 
effect of having less background galaxies in the redshift slice. 

With the chosen set of parameters we can calculate the 
detection completeness as a function of redshift for clusters 
of varying total mass. Fig. [13] shows the result: clusters of 
mass 

Aftot ~ 2 x 10 15 M are detected with a high com- 
pleteness up to z — 1.5, whereas the lower-mass clusters 
show rapidly declining completeness at lower redshifts. The 
FOF algorithm achieves a higher completeness than VT for 
clusters of equal mass; however the contamination of spuri- 
ous sources is found to be higher. 

The effects of contamination of the individual detec- 
tion methods can be greatly reduced by cross-checking the 
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Figure 11. Mock clusters as recovered by the Voronoi Tessellations and Friends-Of-Friends algorithms. The left panel shows the 
distribution of cluster galaxies in the mock catalogues; note that the background galaxies have been removed from the plot for clarity. 
The middle panel shows the clusters as recovered by the VT method, whereas the right panel shows the clusters as recovered by the 
FOF method. The three simulated cluster fields (top to bottom) are identical to the ones in Fig. 1101 where the top panel contains the 
clusters of varying mass, the middle panel the clusters at varying redshift, and the bottom panel the clusters of varying ellipticity. 



output of both methods. Since both methods use different 
measures to isolate clusters (galaxy density in VT versus 
separation in FOF) the false detections in both do not typi- 
cally coincide. Therefore by cross-checking the output of the 
two methods and choosing a sensible lower limit for the re- 
liability factor F, the spurious sources due to biases in the 
algorithms disappear, leaving only chance galaxy groupings. 
Fig.[l4]is an example of this: it shows the cluster candidates 
found in all redshift slices by both methods; although there 
are spurious detections both from VT and FOF, none are 
found by both. In Fig. [I5]the efficiency, in terms of the num- 
ber of real clusters as a fraction of the total detected clus- 
ters, in all 30 mock catalogues is plotted for either method. 
Here all clusters with F 0.2 are included. The median effi- 
ciency is 0.8 for both methods; none of the spurious sources 
are detected by both techniques. Note however that this is 



purely an upper limit to the efficiency: for a true estimate 
the proper spatial correlation function of both background 
galaxies and clusters need to be taken into account (for an 
in-depth discussion of the efficiency for varying cluster mass 
and redshift in an accurate spatial model, see the follow-up 
paper [Van Breukelen et al. in preparation]). Furthermore, 
the quality of the photometric redshifts plays an important 
role. As discussed in VB06 and shown in Van Breukelen et 
al. 2009, artifacts like redshifts spike can yield a significant 
number of spurious sources in the cluster catalogue. 

Cross-checking the results of the two methods means 
the completeness is limited to the lower value found of the 
two. With our chosen set of parameters and keeping only 
the structures found with F > 0.2 in both methods, we can 
calculate the mass selection function of our algorithm. This 
is shown in Fig. [16] for three levels of completeness. 
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Figure 12. The fraction of simulated cluster galaxies recovered by both detection methods. The number of simulated cluster galaxies is 
determined by the mass of the cluster as well as the magnitude limit at the cluster's respective redshift. VT systematically overestimates 
the number of galaxies, whereas FOF recovers a more accurate number of galaxies. The top panel shows the recovered fraction as a 
function of cluster luminosity, the middle panel as a function of redshift, and the bottom panel as a function of ellipticity. 



The final application of our simulations is to derive a 
relationship between the total cluster mass (or luminosity) 
and the number of recovered cluster galaxies. As Fig. 1121 
shows, the number of galaxies found by FOF is much more 
consistent and better-behaved than the number of galax- 
ies detected by VT. Therefore we only use the FOF output 
to determine the total cluster mass. This is done by tak- 
ing all galaxies that occur in the cluster in > 15% of the 



MC-realisations in which the cluster itself is detected. The 
galaxies that appear in a smaller fraction of MC-realisations 
are very likely to be interlopers from different redshifts. Cal- 
culating TVgai for all cluster-masses at all redshifts yields 
functions of JV ga i vs. z for total constant mass or luminosity. 
These are shown in Fig. 1171 The number of detected galax- 
ies at constant mass declines more steeply than a magnitude 
selected sample would, since the fraction of recovered ver- 
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Figure 14. An example of cluster candidates selected in a sim- 
ulated catalogue by both detection methods with F ^ 0.2. The 
candidates found by VT are shown in black, the ones found by 
FOF in red. Although both methods detect spurious sources, none 
are found by both. 
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Figure 15. The efficiency, in terms of the number of real clusters 
as a fraction of the total detected clusters, for either method in 
all 30 mock catalogues (black solid line and red dashed line for 
VT and FOF respectively). All clusters found with F 0.2 are 
included in this diagram. 



sus simulated galaxies for the FOF method becomes smaller 
at higher redshift (see Fig. I12|l . The total cluster mass of 
cluster candidates found in real data (see VB06) can be es- 
timated by overplotting the number of cluster galaxies and 
interpolating between the lines of constant cluster mass. 



4 SUMMARY 

To summarise, the main points of this paper are set out 
below. 

We have created a new cluster detection algorithm of 
which the main characteristics are: (i) each galaxy's full red- 
shift probability function is utilised, and (ii) cluster can- 
didates are selected by cross-checking the results of two 
substantially different selection techniques: Voronoi Tessel- 
lations and Friends-Of-Friends. 



Each selection technique is dependent on two parame- 
ters. Voronoi Tessellations uses / m in, the limiting cell den- 
sity, and iVexp, the maximum expected number of groups 
caused by background fluctuations. The parameters of the 
Friends-Of-Friends algorithm are Dunk, the linking distance, 
and n m in, the minimum number of galaxies in a group. 

Simulations using mock background galaxy catalogues 
with clusters superimposed allow us to choose optimum val- 
ues for the algorithm's parameters. We use iVoxp = 0.1, 
/min = T74, Aink = 0.175 Mpc, and n min = 5. 

Neither selection method shows an obvious bias to clus- 
ter ellipticity. However, the recovered shape of the clusters 
differs for both methods: VT tends to pick up more back- 
ground galaxies at the edges of the clusters; by contrast, 
the galaxy members recovered by FOF are more centrally 
concentrated. 
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Figure 16. Cluster mass selection function versus redshift. The 
selection function is shown for three completeness (C) levels: 95% 
(solid line), 50% (dashed line), and 5% (dotted line). 
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Figure 17. Constant-mass functions for the number of recovered 
cluster members with redshift. The functions plotted are for total 
cluster mass of 0.5 (purple), 1.0 (blue), 2.0 (green), 10, and 20 X 
1O 14 M (black). 



Cross-checking the output of the Voronoi Tessella- 
tions and the Friends-Of-Friends method eliminates spurious 
sources in the simulated cluster searches. However, low-level 
clustering within the background has not been taken into 
account . 

The simulations yield completeness estimates as a func- 
tion of redshift and cluster mass; these can be found in 
Fig. 1131 Furthermore, they provide us with a method of de- 
termining cluster mass, deduced from the number of galax- 
ies found with the Friends-Of-Friends method and shown in 
Fig. El 
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